agation 


A Journal of Science Communication 


aX 








VOLUME 3, NO. 1, JANUARY, 2012 


Designing an Automated System for Medical Diagnosis 


Abstract 


This paper proposes an automated system for 
recognizing disease conditions of human skin in 
context to health informatics. Skin texture images, 
displaying three dermatological skin conditions, are 
analyzed using a texture analysis technique, based ona 
set of normalized symmetrical Grey Level Co- 
occurrence Matric (GLCM), and features are 
extracted from them using automated algorithms. The 
features are fed to neural network classifiers for 
identification of the disease type. The features are 
considered in various combinations viz. individually, 
in joint 2-D and 3-D feature spaces, to find out the best 
recognition accuracies. 
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1. Introduction 


In recent years, computer vision methodologies have 
been applied to the fields of health informatics and 
telemedicine for automated diagnosis of diseases. 
Interest in automated health diagnosis has been 
triggered by the huge collection of medical images 
generated everyday all over the world. For example, 
the Radiology Department of the University Hospital 
of Geneva alone produces more than 12,000 images a 
day [Muller, 2004]. Automated diagnosis measures 
have shown great potentials for reducing diagnostic 
errors and improving the accuracy and efficiency of 
medical diagnosis. Diagnostic errors have huge 
negative impact on patient care, such as an incremental 
cost per patient and increase in hospital stay as reported 
in a Harvard study [Bates, 1997]. In the United States 
alone, medical error results in 44,000-98,000 
unnecessary deaths each year and 1,000,000 excess 
injuries [Weingart, 2000]. Ironically, most diagnostic 
errors are preventable. Research shows that diagnosis 
errors often occur when clinicians are inexperienced 
and new procedures are introduced. Further, age, 
complex care, urgent care, and prolonged hospital stay 
have been found to be correlated with diagnostic 
errors. Application of automated information systems 
in medical analysis has shown great promise in 
reducing such errors [Copec, 2003] 


Automated diagnostic systems based on 
medical imaging, work by using image processing 
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techniques to recognize and differentiate disease 
characteristics from digital images. Two important 
steps are used : (1) Visual features are extracted from 
the images and represented using a mathematical data 
model (2) the data representation is then fed to a 
statistical classifier like a neural network, for 
identification and classification of the disease. 


Image features usually involve the following either 
individually or in various combinations : color, texture, 
shape. In this paper we focus on using texture as a 
means of identifying skin diseases from medical 
images using an automated procedure. Texture refers 
to visual patterns for describing the variation of color 
or grey tones over the image. The choice of the 
features, depend on what characteristics we are trying 
to identify from the image that best describes the 
diseases in question. In many cases computer system 
designers need to take help from medical professionals 
to isolate the best set of features most useful in 
describing a specific disease condition. The 
organization of the paper is as follows : section 2 
provides an overview of the related work, section 3 
outlines the proposed approaches with discussions on 
overview, feature computation and_ classification 
schemes, section 4 provides details of the dataset and 
experimental results obtained, section 5 provides an 
analysis of the current work vis-a-vis other related 
works, and section 6 provides the overall conclusion 
and scope for further research. 


2. Related Works 


Many methodologies have been proposed to analyze 
and recognize textures and shapes in an automated 
fashion. One of the first studies involved derivation of 
texture energy measures using a set of simple masks 
(vertical, horizontal, diagonal and anti-diagonal) 
[Wang, 1986]. Authors like Tamura [Tamura, 1978] 
made an attempt at defining a set of visually relevant 
texture features. This includes coarseness, contrast, 
directionality, line-likeness, regularity, roughness. 
Pentland [Pentland, 1984] reports a high degree of 
correlation between fractal dimensions and human 
estimates of roughness. Two-state Markov models 
have been used to detect texture edges characterized by 
changes in first order statistics [Huang, 1984]. Gabor 
filters have been used in several image analysis 
applications including texture classification and 
segmentation [Bovik, 1990]. 
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Specifically, computer vision techniques involving 
texture analysis have been applied to health 
informatics to predict and characterize skin diseases. 
N. K. Al abbadi et al. [Alabbadi, 2008] have proposed 
a method for skin texture recognition using a 3 layer 
neural network using both skin color and texture 
features. In [Rubegni, 2002] authors propose a method 
of diagnosis of pigmented skin lesions by using a 


digital dermoscopy analyzer to evaluate a series of 


clinically atypical, flat pigmented skin lesions. Fractal 
parameters such as lacunarity and fractal dimensions 
have been used in diagnosis of skin cancers 
[Blackledge, 2009]. The use of Bayesian networks for 
skin texture recognition has been reported in 
[Shahreza, 2008]. A review of image analysis 
techniques for medical diagnosis can be found in 
[Muller, 2004]. 


3. Proposed Approach 
3.1GLCM: An Overview 


This section describes how a popular texture modeling 
technique called Grey Level Co-occurrence Matrix 
(GLCM) can be used to model texture content in 
images. A GLCM [Haralick, 1979] indicates 
probability of a grey-level i occurring in the 
neighbourhood of grey-level j at a distance d and 
direction 0. 





G=P(i,jld,0) (1) 
GLCMs can be computed from texture images using 
different values of d and @ and these probability values 
create the co-occurrence matrix. Consider a 4 by 4 
ion I of an image having four grey-level intensities 
as shown below, 





oo11 
|o011 
I) 92.2 2 | 
22 3 3 


To compute the frequency of one grey tone in the 
neighbourhood of others, a 4 x 4 matrix is formed (since 
there are four distinct grey tones) and sequential numbers 
along the left (reference) and top (neighbour) are used to 
indicate them. The frequencies in which each pair 
(reference-neighbour) of grey-tones, occur together in I is 
now computed i.e. for a reference grey-tone i, how many 
times the neighbour grey-tone j occurs near it within I, 
and this constitutes the (ij)-th element of GLCM matrix 
G. For simplicity’s sake we consider the distance d as 1 
i.e. only adjacent pixels are considered, and angle @ as 0° 
i.e. along the positive x-axis from left to right. 


@.1 33 | 
0/2210 
_1/0 200 
F5]003 1 
310001 


For example, 0 (reference) adjacent to 0 (neighbour) in 
I occurs 2 times (rows | and 2), hence we put 2 at 
position (0,0) of G, 0 adjacent to 1 occurs 2 times (rows 
1 and 2) hence (0,1) contains 2, 0 adjacent to 2 occurs | 
time (row 3) hence (0,2) contains | and so on. This 
procedure is repeated for all pairs of intensities. 





If we had moved along the —ve x-axis, ie. we had 
looked from right to left, then the matrix formed would 
have been the transpose matrix G'. To make the matrix 
independent of this factor, the transpose is added to the 
original to make it symmetrical viz.S=G+G': 


221 0|)f2000][4210 
orf0200,,/2 20 0|/2 400, 
G+G=|9 03 1/11 03 oft 0 6 IPS 

0001; lo011;lo012 


The symmetrical GLCM is finally normalized by 
dividing each element by the sum of all elements to 
form S,. The ‘0’ in the subscript indicates angle 0 = 0°. 
Directional GLCMs can also be computed along three 
other directions; vertical (0 = 90°), right diagonal (0 = 
45°) and left diagonal (0 = 135°) generating matrices 
Ss, Soq, ANd S55 
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3.2 GLCM based Features 


An 8-bit grayscale image typically contains 256 
different grey tones which generates GLCMs having 
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256 by 256 elements. Since it is inconvenient to deal 
with such large matrices, a set of scalar features are 
usually derived from directional normalized 
symmetrical GLCMs and used for texture 
characterization viz. GLCM Contrast ©, GLCM 
Homogeneity (H), GLCM Mean (M), GLCM Variance 
(V) and GLCM Energy (N) as defined in Eq. (2). Here 
Sij represents the element (i,j) of a normalized 
symmetrical GLCM, and k the number of grey levels. 





3.3 GLCM based Classification 


A texture class i consists of a set of n member images: 
T){t)slsy--0l),} - For each member image, four directional 
symmetrical normalized GLCMs are computed as 
indicated below : 


{ts ts bo is) sess (tos tgs os ths) 


For each directional GLCM, features in Eq. (2) are 
computed. Each feature is averaged over the four 
directional GLCMs, for each member image viz. 


{iv (Cah 


+h Ht 
where, t__= sistas and Xe {C,H,M,N, V}. 


A texture class is dh acacia by the collection of its 
feature values obtained during a training phase. A test 
image Ay with its computed average features (S_), is 
said to belong to a specific texture class if the 
probability of its feature values being a member of that 
training class is maximum. To compute class 
probability neural network classifiers (multi-layer 
perceptron MLP) using feed-forward back- 
propagation architectures are employed in this work. 





3.4 Neural Networks : An Overview 


Artificial neural networks are a set of algorithms meant 
to simulate workings of human nerve cells or neurons. 
A neuron receives input stimuli from a number of 
sources using an electro-chemical process and produces 
a response ('fires') when concentration of electrical 
charges exceeds a certain threshold. An artificial neural 
unit is also visualized as a structure having n inputs, and 
each input channel i can carry a signal xi. The neural 
unit can produce an output o when the sum of the input 
signals exceeds a certain pre-defined threshold @ i.e. 
when x,+X,+...+x,20, See Fig. 1. 


Yo 
Ne 


n 


Fig. |. Artificial neuron unit 

One of the earliest neurons is the McCulloh- Pitts 
neuron where the inputs and outputs are considered as 
binary values. Fig. 2 shows simple implementations of 
‘AND' and 'OR' logical gates using McCulloh-Pitts 
neurons having thresholds of 2 and | respectively, 


x,=[0,0,1,1] AND 


‘ND o (0,0,0,4] 
7 


x= [0,1,0,4] 


OR 


Fig. 2. Using McCulloh-Pitts neurons to implement AND 
and OR logical gates 
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The perceptron was proposed as a more generalized 
model of the McCulloh-Pitts neuron and is nowadays 
extensively used in classification problems in various 
domains. The essential difference is the presence of 
numerical weights associated with input lines. See Fig. 
3. This imparts the perceptron the capability to /earn by 
adapting the weights to suit a particular classification 
problem. The input signals and weights are now no 
longer restricted to binary values but can take on any 
real value. The input signals are designated by an input 
vector | and the weights by a weight 
vector W={w,, v,}. The perceptron also has a bias 
line connected to a bias signal (B) kept permanently 1 
and an associated bias weight (b). The net input N to the 
perceptron is given by; 












N=b+W.X"=b+ Dw,x, (3) 


The output O produced by the perceptron is no longer 
determined by a threshold value but by a transfer 
function /- In most cases the transfer function is of the 
log-sigmoid or the tan-sigmoid forms depicted below : 
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Fig 3. The perceptron 


To solve a problem requires two steps : a training phase 
and a testing phase. During the training phase, a set of 
inputs are fed to the neural unit and the outputs they 
should produce are also known, and these are called 
targets. For example, for simulating an AND gate an 
input of [1, 0] should produce a target of [0] and an 
input of [1, 1] should produce a target of [1]. The 
weights are not known, so initial estimates are assumed 
(often random values or all zeroes). The actual output 
O produced can be calculated from Eq. (3) and Eq. (4) 
above. If the output does not match the target then an 
error is produced and this error is used to modify the 


weights in such a way that subsequent errors are 
reduced. This constitutes an iteration and is called an 
epoch. In the next iteration the inputs are again fed to 
the unit and the new weights are used to calculate the 
error. This process is repeated iteratively until the 
errors are all reduced to zero or some pre-defined small 
value. The unit is said to have converged and the final 
weights are called the balanced weights. The balanced 
weights provide a representation of the problem 
pattern since for all inputs these weights produce the 
correct outputs. 


During the resting phase, an unknown set of inputs are 
fed to the neural unit and the balanced weights are then 
used to calculate the correct outputs. The power and 
flexibility of neural units lie in the fact that the test 
inputs need not be exactly identical with any of the 
training set inputs but only similar, for the perceptron 
to produce the correct decision. This property is 
frequently used to solve problem like pattern 
recognition and character recognition, where training 
is done using a separate set of characters and testing is 
done on another set of similar characters e.g. produced 
using a different font. 


In most real-world problems instead of using a single 
perceptron, we use a network of connected neural units 
since we want multiple outputs at the same time. Such 
networks often have multiple layers of neural units 
between the input and output layers, in which case it is 
called multi-layered perceptrons (MLP) and the in- 
between layers are called hidden layers. See Fig. 4. 
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Fig. 4. Aneural network 


In MLP computations multiple errors are produced at 
the output corresponding to each neural unit, so a 
cumulative error called the Mean Square Error (MSE) 
is used for updating the weights, as depicted below, 
where ei is the error produced at the i-th output unit and 
there are m such units at the output 
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4, Experimentations and Results 
4.1 Dataset 


Skin texture images downloaded from Dermnet 


picture collection (http://www.dermnet.com/) are used 
in the experimentations. The dataset consists of a total 


of 180 images divided into three disease classes: Acne 
(Class-A), Ichthyosis (Class- I) and Keratosis (Class- 
K) with 60 images per class. Each image is 128 by 128 
pixels in dimension and in GIF file format. A total of 90 
images are used as the Training set (T) and the 
remaining 90 images as the Testing set (S). The images 
are arranged in the order A, I, K i.e. the first subset of 30 
images of training (or testing) set belongs to Class-A, 
the next subset to Class-I and the last subset to Class-K. 
Sample images of classes A, I, K are shown in Figure 5. 


For computing recognition rates, features are first 
considered individually, then in two-dimensional (2- 
D) feature space and finally in three-dimensional (3-D) 
feature spaces, involving multiple GLCM features 
simultaneously, Comparisons between training and 
testing sets are done using multi-layer perceptrons. At 
each stage the best results are tabulated and the 
corresponding discrimination plots are reproduced. 
The legends used in this work are listed in Table |. Here 
X denotes a class name which can be either A or lor K. 
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Fig. 5, Samples of medical images belonging to three skin 
condition classes : (a) A(b) LOK 


Table 1: Legends 


TKE| Training set, Class-X,GLCMContrast__ 
TXH | Training set, Class-X, GLCM Homogeneity 
TXM | Trainingset, Class-X,GLCMMean 
TXV Training set, Class-X,GLCMVariance 
ng) 
me = 

















TXN | Training set, Class-X, GLCM Energy 
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4.2 Individual Features 


Values of individual GLCM features C, H, M, N, V 
defined in Eq. (2) for training and testing images for 
each of the three classes are computed. Test images are 
compared to training clusters using NN classifi 
Accuracy results are summarized in Table 2. The first 
column depicts the feature used, the second column 
shows the neural network configuration (NNC) viz. l- 
10-3 indicates | input unit (for the individual feature), 
10 units in the hidden layer and 3 units in the output 
layer (corresponding to the 3 classes to be 
distinguished). The third, fourth and fifth columns 
indicate the percentage recognition accuracies for the 
three classes, the sixth column provides the overall 
accuracy for the three classes and the last column 
indicates the best Mean Square Error (MSE) obtained 
during the training phase of the NNs. In all cases the 
NN classifiers were run for 50000 epochs. Feature 
values were appropriately scaled to lie within the range 
Oto | before being fed to the classifier. 





Table 2: Accuracy results for single features 
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From Table 2 it is observed, that best results are 
produced by M (75.5%) and C (55.5%). Corresponding 
plots depicting the variation of these feature values for 
the three classes over the training and testing datasets 
are shown below. 
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Fig. 6, Feature plots for (aJTAM, TIM, TKM, SAM, SIM, 
SKM (b) TAC, TIC, TKC, SAC, SIC, SKC 


4.3 Joint Features in 2-D Feature Space 


To improve upon the results obtained using individual 
features, joint features are next considered in two- 
dimensional feature spaces i.e. C-H, C-M, C-N, C-V, H- 
M, H-N, H-V, M-N, M-V, N-V. Accuracy results are 
summarized in Table 3. In all cases the NN classifiers 
were run for 50000 epochs. Feature values were 
appropriately scaled to lie within the range 0 to 1. 





Table 3 : Accuracy results for joint 2D features 






(46.6)48.9 0.10 
|43.3/53.3| 0.09 






63.3|77.8| 0.05 
43.31 33.3| 0.08 
40 (35.5) 0.13 

| 0.06 
0.06 
0.12 


0 
58.3, 56.6, 48.9) 54.6 


From Table 3 it is observed that best results are 
produced by M-N (80%) and H-M (77.8%). 
Corresponding plots depicting the variation of these 
feature values for the three classes over the training and 
testing datasets are shown below. 
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Fig. 7. Feature plots for (a) TM vs. TN, SM vs. SN (b) TM 
vs. TH, SM vs. SH 


4.4 Joint Features in 3-D Feature Space 


To improve upon the results obtained using individual 
features, joint features are next considered in three- 
dimensional feature spaces i.e. C-H-M, C-H-V, C-H- 
N, C-M-V, C-M-N, C-N-V, H-M-V, H-M-N, H-N-V, 
M-N-V. Accuracy results are summarized in Table 4. In 
all cases the NN classifiers were run for 50000 epochs, 
Feature values were appropriately scaled to lie within 
the range 0 to |. 


Table 4: Accuracy results for joint 3D Features 








| | MSE | 
| 63.3) 74.4! 0,04 
36.6.) 41.4) 0.41 | 
666| 50 | 0.06 | 
40 | 44.41 0,07 
66.6 | 82.2, 0,04 
46.6| 50 | 0.09 
§3.3) 75.5) 0.04 | 
§3.3| 73.3| 0.05 | 
3| 48.9) 0,08 
60_| 72.2) 0.04 
61.2 } 





From Table 4 it is observed that best results are 
produced by C-M-N (82.2%) and H-M-V (75.5%). 
Corresponding plots depicting the variation of these 
feature values for the three classes over the training and 
testing datasets are shown below. 
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Fig. 8. Feature plots for (a) TC vs.TM vs. TN, SC ys, SM 
ys. SN (b) TH vs. TM ys. TV, SH vs. SM ys. SV 


5, Analysis 


Automated discrimination between three skin texture 
classes was done using a variety of approaches to find 
the optimum results. Out of five GLCM based features 
considered individually, M produced the best 
recognition rate of 75.5%. Among joint 2-D feature 
spaces M-N produced the best result of 80%, Joint 3-D 
feature spaces were seen to improve on the accuracy 
rates to 82.2% using C-M-N. Best results are 
summarized below. 


Table 5 : Best performance results 


Indi ~ Joint2D | Joint3D | 
M-N C-M-N 
80.0 822 | 
















uM 
75.5 
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Out of the disease classes, I was the best recognized 
using single features (60.6%), while A was the best 
recognized using 2-D features (58.3%) and 3-D 
features (70.3%). 


The best recognition result, by using a combination of 
Contrast, Mean, Energy features, was obtained by 
using a neural network having a configuration of 3- 
150-3 i.e. 3 input units, 150 units in the hidden layer 
and 3 output units. The three feature vectors C, M,N of 
each training sample, were fed to the three inputs of the 
neural net and it was trained in this manner by a total of 
90 samples, 30 per class. The perceptron took 50000 
epochs to converge to an MSE value of 0.04. The 
network convergence and output plots are shown 
below. The output plot depicts a 86.6% accuracy for 
recognizing disease A, 93.3% for B and 66.6% for C, 
each class being represented by 30 test samples. 


Pedamance 065188, Goo 004 
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Fig. 9. NN convergence and output plots for the C-M-N 
feature vector 


To put the above results in perspective with the state-of 
the-art, the best results reported in [Shyu, 1999] for 
identifying disease classes from 302 lung section 
images involving texture features homogeneity, 
contrast, correlation and cluster, in addition to other 
features like grey-level histogram, is 76.3%. Accuracy 
for classification of 800 endoscopic images in [Xia, 
2005] using a fusion of color, texture and shape features 
ranges from 77% to 90% but only about 25% involving 
texture features alone. Accuracy results reported in 
[Alabbadi, 2008] tested on 300 skin texture images is 
96% but uses 9 color features in addition to 4 texture 
features, entropy, energy, contrast, homogeneity. 





6. Conclusions and Future Scopes 


This paper proposes an automated system for 
recognizing disease conditions of human skin by 
analyzing skin texture images using texture 
recognition techniques. Skin disease conditions differ 
in appearance in a way which cannot be modeled 
appropriately by specific colors and can best be 
identified using statistical variation of texture. Such 
automated medical diagnosis systems can prove 
extremely useful where there might be a dearth of 
good medical professionals. On one hand this would 
be useful for dermatologists to reduce diagnostic 
errors, while on the other it can serve as the initial test 
bed for patients before seeking expert advice. 
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The study reveals that performance improved when 
joint features were considered as compared to 
individual features, A salient feature of this approach is 
the low-complexity data modeling scheme whereby a 
small number of scalar values are used to represent 
image content, instead of multi-dimensional vectors 
like histograms. This makes the system low on 
computational overheads, and makes it suitable for use 
in remote and rural sectors, where computational 
resources can be scarce. Low resources also imply low 
cost involvements. 















The accuracy of the current system is comparable to 
those reported in contemporary works, Most of the 
other works have dealt with 24-bit images and have 
utilized color based features in addition to texture 
based features. In comparison the current work has 
used 8-bit images and only texture based features. It is 
expected that accuracy results can be improved upon 
by using: (1) Color features along with texture, by 
employing GLCMs on individual R, G, B color 
channels of 24-bit images (2) Normalization of the 
brightr and contrast of the images by pre- 
processing, involving histogram equalization, before 
calculation of features. 
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